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Under what conditions is an edge present in a social network at time t likely to decay or persist by some future time t + Af? 
Previous research addressing this issue suggests that the network range of the people involved in the edge, the extent to which the 
edge is embedded in a surrounding structure, and the age of the edge all play a role in edge decay. This paper uses weighted data 
from a large-scale social network built from cell-phone calls in an 8-week period to determine the importance of edge weight for 
the decay/persistence process. In particular, we study the relative predictive power of directed weight, embeddedness, newness, 
and range (measured as outdegree) with respect to edge decay and assess the effectiveness with which a simple decision tree and 
logistic regression classifier can accurately predict whether an edge that was active in one time period continues to be so in a future 
time period. We find that directed edge weight, weighted reciprocity and time-dependent measures of edge longevity are highly 
predictive of whether we classify an edge as persistent or decayed, relative to the other types of factors at the dyad and neighborhood 
level. 
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1. Introduction 

Under what conditions are particular social connections more 
or less likely to dissolve over time? Most network analysts 
agree that the issue of the dynamic stability of social rela- 



and Degenne 2005| l. One obvious reason for the centrality 
of relationship dynamics is that essentially all of the classic 
behavioral theories in the network tradition — such as balance 
( |Heider| |1958[ [Davisj |1963| l and exchange theory ( jEmersoii} 



1972 1 — can be productively considered theories about the rela- 



tionships embedded in networks is a fundamental one ( Suitor 



'-j leTaLl [T9971 IWellman eTaLl [19971 fFeld et al.| [20071 |Bidart| 
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tive likelihood that some edges will persist and other edges will 
be dissolved ( [Hallinan] [1978 1). For instance, classic balance- 
theoretic analyses of the dynamics of reciprocity suggest that 
the reason why we are more likely to observe tendencies to- 
ward reciprocity in human social networks is precisely because 
unreciprocated edges have a shorter lifespan — they are more 
likely to be dissolved by the unreciprocated party — and are thus 
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ment: the reason why the intransitive "forbidden triad" is rare, 
is precisely because dyads embedded in fully-reciprocated tri- 



and Lavenu 2005| l. This has been aided by the recent develop- 
ment of actor-oriented, stochastic approaches for the analysis 



ads are expected to be less likely to decay over-time (Davis of longitudinal network data (van de Bunt et al. 1999 Sni- 



|1967| l — a proposition that has received some empirical confir- [jders[ |2005| l which couple the evolution of micros-structures 



mation by Burt (20001. While much attention has been paid 
to the emergence of transitivity in social networks through a 
process of meeting through an intermediary, it is clear that 
thinking dynamically about the persistence of transitivity in 
social networks — through the selective dissolution of relation- 
ships not embedded in triads — transforms this to a problem of 
accounting for the structural precursors of edge decay. This 
also implies that empirically "bridges" across transitive clus- 



with agent-level attributes and behavioral outcomes (see Sni- 
jders et al. ( 2010[ l for a recent review). 



However, in spite of its centrality for the main lines of the- 
ory in network analysis, the dynamics of link decay remains a 
relatively understudied phenomenon, especially at the level of 
behavioral observation. In this respect, the main roadblock to a 
better understanding of the dynamics that drive patterns of de- 
cay of social edges in networks has been the relative paucity of 



ters should decay at a faster rate than other types of edges ( Burt 
I2OO2I 1. 

In addition to these theoretical considerations, there are sev- 
eral substantive and practical motivations for the attempt to 
make progress in predicting edge persistent decay and persis- 
tence. First, at the level of the whole network, edge decay may 



large-scale, ecologically reliable data on social interactions (Ea- 



gle et al. 2008 1. The methodological and measurement issues 



signal changing community structure ( Tantipathananandh et al 



associated with dynamic network data collected from informant 
reports on who they are connected to are well-known and well- 
documented, so there is no need extensively rehearse them here 
( [Bernard et aTj [19841 [Krackhardt| [1987) 1. These include: (1) 
systematic measurement error introduced by constraints on va- 



2007 1. From an ego-centric perspective, if a given actor experi- lidity due to informant recall biases (Brewer 2000 Brewer and 



ences high-levels of volatility and decay in her current relation- 
ships this may indicate that he or she is moving between peer 
groups or undergoing a major life change ( [Suitor and Keeton) 
[1997) [Feld etalj [20071 [Bidart and Lavenu[[2005| l. Second, rela- 
tionships that are identified as likely to decay may under some 
circumstances (e.g. when there is a need to binarize a weighted 
matrix) be better thought of as "false positives." In passively- 
collected behavioral data such as email and cell phone com- 
munications (Kossinets 2006[ Hidalgo and Rodriguez-Sickert 



[Webster[ [2000[ [Marin[ [2004| ), (2) measurement constraints in- 
troduced due to reliance on so-called fixed-choice designs to 
accommodate for respondent's memory limitations and stamina 
( [Feld and Carter) [2002) |Kossinets[ [2006j l, and (3) validity hm- 
its introduced by data collection strategies that are limited (due 
to cost and the relative obtrusiveness of sociometric question- 
naires) to small samples constrained to specific sites ( [Laumann[ 



et al. 1989 1 (4) hmitations in the ability to measure the volume 



2008| l — the source of data on which we rely in the analysis 



below — the notion of what exactly constitutes an edge is some- 
what unclear Being able to predict edge decay may shed light 
on the circumstances under which an edge can be considered as 
"real" for the purposes of further analysis. 

More recently, with the increasing availabihty of longitudi- 
nal social network data, the temporal evolution of social net- 



and frequency of communicative activity that flows through an 
edge, with most studies being relegated to using standard bi- 
nary networks in which links are thought to be either present or 



works is beginning to receive increasing attention ( Burt 2000 



2002 Wellman et al. 1997 Bidart and Degenne 2005 Bidart 



absent (Opsahl and Panzarasa 2009 Hammer 1985 1. 

In the decay and formation dynamics of social relationships, 
these well-known limitations acquire renewed importance for 
three reasons. First, as large-scale (sometimes containing 
thousands — and in our case millions — of actors) data on human 
communication begins to accumulate, examining the extent to 
which standard analytic approaches can be used to account for 



empirical dynamics in this domain becomes a primary concern. 
Second, when considering the issue of link decay, the prob- 
lem of biases produced by memory limitations, artificial up- 
per bounds on actor's degree produced by survey design stric- 
tures and the selective reporting of those contacts most subjec- 
tively (or objectively) important becomes an issue of substan- 



tive and methodological significance (Holland and Leinhard 



[T973)|Kossinetsl[200gl|Kossinets and Watts||2006| ). For while it 
is unlikely that persons will misreport being connected to those 



with whom they interact most often ( Hammer 1985 Freeman 



|et al.| |1987| l, by selectively collecting data on ego's strongest 
edges, it is likely that survey-based methods may give undue 
consideration to precisely those links that are least likely to de- 
cay. Finally, ignoring the fact that most real-world communica- 
tion networks are not binary — each edge is instead "weighted" 
differently depending on the amount of communicative activity 



that flows through it ( Barrat et al. 2004 1 — can impose artificial 



limits on our ability to predict which links are more Ukely to de- 
cay and which ones are more likely to remain. In this paper we 
use behavioral network data on a large-scale sample of commu- 
nicative interactions — obtained unobtrusively from cell-phone 
communication records — to study the dynamic and structural 
processes that govern link decay. One advantage of the data 
that we use below is the fact that it consists of weighted finks 
based on dyadic communication frequency 

It is of course not our intention to suggest that data obtained 
from cellular communication records are themselves devoid of 
bias or that previous research using self-report data do not con- 
stitute a solid foundation on which to build. In fact, we rely 
on research and theory from such studies in the analysis that 
follows. Cellular communication data are certainly not a direct 
reflection of the underlying social network. Communicating 
by phone is only one out of a large menu of possible ways in 
which two persons may be connected; and in fact may persons 
can share strong connections without necessarily talking over 
the phone. In addition just like informants may fail to mention 
their least important ties, rare-behavioral events (e.g. contact- 
ing somebody whom you only talk to once a year) will also be 



absent from observational data unless really long observation 
windows are used, thus producing a similar observational bias 
keyed to relative strength. 

It is our contention however, that data obtained from sponta- 
neous behavioral interactions will produce dynamical patterns 
that may be closer to those that govern the formation, suste- 
nance and decay of human social relationships in "the wild" 
(Hammer 1985| l. As such, they are an important resource to 
establish the structural and dynamic properties of large-scale 
social networks.We already know that data of this type have 
high ecological validity, in that cell-phone mediated interaction 
accurately predicts face-to-face interaction and self-reported 
friendship as measured using traditional sociometric methods 
( [Eagle et al.| 2009| l. With penetration rates close to 100% in 
industrialized countries such as the one from which these data 
were collected ( [Onnela et al.j |2007| l, cell-phone communica- 
tions are also generally devoid of the socio-demographic biases 
that plagues studies that rely on modes of communication that 
have yet to achieve comparable levels of universal usage (such 



as email or chat). Onnela et al. (2007 1 examined basic topolog- 
ical properties of a cell-phone communication network similar 
to ours, and found it to display some basic signatures specific to 
social networks (e.g. small mean-path length, high-clustering, 
community structure, large-inequalities in connectivity across 
vertices, etc.). 

This paper makes several contributions to the literature. First, 
on the substantive side, we incorporate insights and mecha- 
nisms from previous studies of network evolution to understand 
processes of link decay. In addition we bring into considera- 
tion dyad-level process-such as degree of reciprocity-that have 
not yet been considered in studies of edge decay (mostly due 
to the fact that the data used are binary and not weighted). On 
the methodological side, we introduce supervised learning tech- 
niques from the computer science literature for the study of so- 
cial network evolution. These techniques are appropriate for 
discovering patterns in data of the size and scope with which 
we are faced here (millions of persons and tens of millions 
of communication events), both extending and complement- 



ing the more traditional regression-based techniques that have 
been used to tackle this problem in the existing Uterature (e.g 
Burt 2000 2002 | l. Machine learning algorithms allow us to as- 
certain the relative importance of individual, dyadic and local- 
structural information in contributing to lowering or increasing 
the likelihood of link decay without incorporating strong as- 
sumptions about functional form — they are "non-parametric" 
in this respect — or homogeneity of effect sizes across the rele- 
vant feature space. 

The remainder of the paper is organized as follows: In the 
following section we briefly review previous research on edge 
decay in social networks. In Section[3]we connect the substan- 
tive concern with identifying the factors that lead to link decay 
in the social networks with the largely methodological literature 
related to the link prediction problem in computer science and 
explain how we partially adapt these tools to the task at hand. 
In Section]?] we go on to review previous work on the dynam- 
ics of social relationships in large-scale networks. In Section]?] 
we describe the data on which we conducted this study and for- 
mally define each of the problems we consider Section]?] de- 
scribes basic topological and distributional features of our main 
predictors. In Section ]8] we examine the correlation structure 
among the network features that we choose for the prediction 
task. In Section]9]we present the results, identifying which net- 
work features are the strongest predictors of edge decay. In 



Section 10 we analyze the classifier's performance and explore 



their comparative fit. Finally in Section 11 we discuss the sub- 
stantive implications of our results, draws conclusions, and lay 
out potential avenues for future research. 

2. Correlates of Edge Decay in Social Networks 

A great deal of effort has gone into characterizing the growth 
of networks, either with high-level generative models (see 



( Chakrabarti and Faloutsos 2006 1 for a survey) or by analyz 



ing the formation of individual links (Hays 1984 Marmaros 



and Sacerdote 2006 1. Comparatively little work has been done 
on decay dynamics in large-scale networks with an already ex- 
isting structure: the processes by which individual actors leave 



the network or individuals sever edges. The most exemplary 
work on the issue of edge decay in social networks is that of 
Burt (2000 2002| l, who studies the social networks of promi- 
nent bankers over time and analyzes the factors that contribute 
to the disappearance of edges. Specifically, prominent bankers 
within an organization were asked, once a year for four years, 
to name other bankers from the same organization with which 
they had had "frequent and substantial business contact" over 
the previous year. Two main substantive conclusions emerge 
from this analysis: 

1 . Several factors influence edge decay, including homophily 
(similarity between people), embeddedness (mutual ac- 
quaintances), status (e.g. network range), and experience. 

2. Links exhibit a "liability of newness", meaning that newly- 
formed links decay more quickly than links that have ex- 
isted for a long time. 

These observations seem to lay out a framework for predict- 
ing link decay (and by implication, link persistence), and that is 
precisely the chief question of this paper: What are the vertex- 
level, dyad-level and local-structural features that can be used 
to most accurately predict edge decay? A formal statement of 
this research question gives rise to what we will call the decay 
prediction problem: Given the activity within a social network 
in a time period ti, how accurately can we predict whether a 
given edge will persist or decay in a following window r2? In 
what follows we evaluate the effectiveness of a machine learn- 
ing solution to the decay prediction problem. 

3. The Link Prediction Problem 



The problem of decay prediction is intimately related to the 
link prediction problem. There are several related but slightly 
different problems that are termed "link prediction" in the com- 
puter science literature. The most related one, originally studied 



by Liben-Nowell and Kleinberg ( 2007 1 can be stated as follows: 
given the state of a network G - {V, E) at time f, predict which 
new edges will form between the vertices of V in the time inter- 
val t = (f, r + Af). See ( |Bilgic et~aL]]2007l|Clauset et al.|]2008] l 



for additional work in this vein or ( jGetoor and Diehl| |2005| l for 
a survey. 

Other authors ( |Kashima and Abe] |2006| ) have formulated the 
problem as a binary classification task on a static snapshot of 
the network, but this version of the problem is less related to 
the present effort simply because it is not longitudinal in na- 
ture. Current research on link prediction in computer science 
focuses mostly on evaluating the raw predictive ability of differ- 
ent techniques, by either incorporating different vertex and edge 
attributes ( |0'Madadhain et al.|[2005)|OMadadhain et al.|[2005j 
[Popescul and Ungar| |2003[ ) or the selecting diff'erent learning 
methods ( Hasan et al. 2005| l in order to improve prediction per- 
formance. Where we differ from this work, apart from address- 
ing a slightly different problem, is that we attempt to system- 
atically characterize the attributes that lead to successful clas- 
sification. In other words, rather than being concerned simply 
with whether, or to what extent, our models succeed or fail, we 
attempt to characterize why they are successful or unsuccess- 
ful by measuring the importance of different attributes and of 
weighted edge data to classification. 

4. Previous longitudinal research on large-scale networks 

Several authors have studied the evolution of large networks 
and identified characteristics that are important to the forma- 
tion of edges. |Kossinets and Watts] ( |2006l ) studied the evolu- 
tion of a University email network over time and the extent 
to which structural properties, such as triadic closure, and ho- 
mophily contribute to the formation of new edges. Of particular 
relevance to us, they find that edges that would close triads are 
more Ukely to form than edges that do not close triads, and that 
people who share common acquaintances are much more likely 
to form edges than people who don't. Similarly, |Leskovec et al.| 



( 2008 1 study the evolution (by the arrival of vertices and the 



formation of edges) of four large online social networks and 
conclude, among other things, that triadic closure plays a very 
significant role in edge formation. Both of these factors are re- 
lated to the notion of embeddedness which we study in the con- 
text of edge decay, but neither of these authors consider edge 



decay at all. MarsiU et al. ( 2004| l develop a model for network 
evolution that allows for the disappearance of edges, but they do 
not validate the model on any real-world data. As a result, the 
extent to which social networks fit the model is unclear and it 
does not shed any light on the mechanisms behind edge decay. 



The effort that is closest to ours in principle is a paper by Hi- 



dalgo and Rodriguez-Sickert ( 2008 1, which analyzes edge per- 
sistence and decay on a mobile phone network very similar to 
our own. However, the analysis undertaken below differs criti- 
cally from theirs both in methodology and primary focus. The 
aforementioned paper relies on a highly circumscribed set of 
well-established physical network statistics (i.e. degree, clus- 
tering coefficient) as well as reciprocity to explain decay. In 
what follows, we consider time-dependent properties of edges 
( |Burt[ [2000| l as well as features associated with interaction fre- 



quency (edge weight) ( Marsden and Campbell 1984 Hammer 
[T985i IBarrat eFal] |2004| . 



5. Data and features 



5.7. Data 



Our primary source of data in this study consists of informa- 
tion on millions of call records from a large non-U. S. cell phone 
provider. The data include, for each call, anonymized informa- 
tion about the caller and callee (i.e. a consistent index), along 
with a timestamp, duration, and the type of call (standard call, 
text message, voicemail call). Our original dataset is composed 
primarily of phone calls and text messages. In the empirical 
analysis that follows, however, we restrict ourselves to dyadic 
communications that take the exclusive form of a voice call (we 
exclude text messages). We exclude all vertices with more that 
fifty neighbors, to ensure that only persons (and not auto-dialing 
robots) are represented in our data. 

Our final dataset consists of all in-network phone calls made 
over a 8-week period in 2008. We restrict our attention to in- 
network calls (where both the caller and callee use our provider) 
because we only have information about calls initiated by our 
provider's customers. That is to say, if ; is on the network but j 



Statistic 


Value 


Average Clustering Coefficient (c/, >= 2): 


U.Z4 


Median Clustering Coefficient: 


r\ 1 y1 

U.i4 


Average Out-Degree: 


A O 


Median Out-Degree: 


J 


Average Total Degree: 


6.3 


Median Total Degree: 


3 


Number of Vertices: 


4,833,408 


Number of Edges: 


16,564,958 



Table 1 : Basic graph-statistics of the cell-phone network. 



is not, we know if and when / calls j, but not if and when j calls 
/. In order to accurately predict the decay of edges, we need to 
be able to capture the degree of reciprocity in the relationship, 
meaning we need to be able to see if and when j calls / back. 
Thus, we only examine edges where both / and j use our cell 
phone provider. 

5.2. Connectivity Criterion 

Naturally, we represent this information as a directed social 
network, where the vertices are the individual subscribers. An 
edge exists from actor / to actor j if / has at least one voice 
communication with /' during an initial window ti = {t,t + 
At), which we define as ti - 4"'*'''^*. Using this connectivity 
criterion, we identify approximately 16.5x10^ directed edges 
in the network (see Table[T]l. Edges can be either bi-directional 
or directed arcs, depending upon whether j made a call back to 
i during ti . Table [T] shows some basic topological statistics of 
the observed graph. 

5.3. Features 

Using the connectivity structure of the network constructed 
from the first four- weeks of data, we extract a number of vertex- 
level, dyad-level and higher-order features based on the intu- 
itions provided by previous research and theory on relationship 



given in Table[2]and can be grouped into four categories or sets: 
vertex, dyadic, neighborhood, and temporal features[^ 

5.3.1. Vertex-level features 

The vertex-level features include the outdegrees of / and j (di 
and dj), and the overall communicative activity of each vertex 
(c, and Cj), that is the overall number of calls made by each 
member of the dyad during the 4-week time period, respec- 
tively. 

5.3.2. Dyad-level features 

The dyadic level features include the directed arc strength, 
i.e. the counts of the number of voice calls made by / to j (cij), 
and the number of calls made by j to /, c,j. We also compute 
normalized versions of arc strength (pij and pij) which are sim- 
ply the proportion of all calls made by an agent that go to that 
neighbor, where pij - Cjj/ci. 

5.3.3. Neighborhood-level features 

The neighborhood-level features include (1) the number of 
common neighbors between / and j (cn), (2) directional ver- 
sions of the number of common neightbors (in and jn) which 
indicate the number of /'s (or j's) neighbors that called j (or 
0, and (3) second order embededness features [in jn and jnin) 



dynamics (e.g. [Halfinanl [T978| [Burt| [20001 |Feld et al.| [2007) , 
especially as they pertain to behavioral networks with weighted 



' We do not include any homophily-based features in this analysis, as we do 
not yet have reliable customer demographic information for the time period in 



edges (Hammer 1985 Barrat et al. 20041. These features are question 



which measure the number of edges among / and fs neighbors. 
in jn does this by counting as an edge calls made from one of /'s 
neighbors to one of / neighbors, while jnin considers an edge 
as existing when one of / neighbors calls one of /'s neighbors. 

5.3.4. Temporal features 

Finally we look at two features related to the (observed) tem- 
poral evolution of dyadic communicative behavior: fdate cap- 
tures edge newness as indicated by the time of the first call from 
/ to j during our temporal window t; fdate marks far into our 
time window, t, we first observe a call from / to / Higher val- 
ues indicate newer edges, while smaller values indicate older 
edges. The second temporal feature, edate, captures the edge 
freshness as indicated by the time of the last call made by / to j, 
given that the edge has already been observed to exist. Higher 
values indicate that the edge was active in the more recent past, 
while smaller values indicate that the edge has been inactive for 
a longer period of time. 

To the best of our knowledge, prior work has not consid- 
ered the freshness of an edge as a predictor of persistence/decay, 
though edge newness has been seen as as an important predic- 
tor of short-term decay via Burt's isolation of the phenomenon 
of the "liability of newness" of social ties (Burt 1997 2000| l. 
We believe that the freshness of an edge could be an important 
predictor as it indicates how current the edge is and we expect 
that more current edges are more relevant in the immediate fu- 
ture. If persistence or decay are partly a markov process with a 
relatively short memory, then edge-freshness should emerge as 
an important predictive factor 

5.4. Edge-decay and edge-persistence criterion 

We use these features to build a model for predicting whether 
edges fall into two disjunctive classes: persistent or decayed. 
For the purposes of this analysis an edge is said to persist if it is 
observed to exist in the time period T2 = (t+At, f-i-Af-i-Af') given 
that it was observed in the previous time period ti - (f, f -H Af) 



in T2 = (f -H Af, f + Af -H Af') using the same operational crite- 
rionj^Note that the observation and criterion periods are evenly 
divided such that ri = T2 = 4"'"'°** (see Figure[T]). 

6. Machine-Learning models for the edge-classification 
problem 

Having obtained a set of structural features from the network 
built from the information observed in ri, our final task is to 
build a model that will allows us to most effectively assign each 
edge to either the persistent or decayed class using the criterion 



outlined in section 5.4 above. Given the large scale of our com- 
munication network, we turn to methods from data-mining and 
machine-learning to accomplish this task. 

We proceed by arranging the available data as a set of in- 
stances or examples, each of which is observed to belong to 
a given class, which in our case is either persistent or decayed. 
As we noted above, associated with each instance is a set of fea- 
tures or attributes. The task is to build a generalizable model 
from the available data. In our case, since our class takes only 
the value (decayed) or 1 (persistent), we need to derive a func- 
tion F : X — > {0, 1) which predicts (with some ascertainable 
accuracy) the class of an attribute given a vector of features X. 

After building the model, we need to validate its effective- 
ness on a set of instances that are diff'erent from those used to 
build the model. Typically this is done by dividing the available 
data into two disjoint subsets (the horizontal fine in Figure [T]l. 
The first subset, called the training data, is used to build the 
model. Once the model is built, we use it to predict the class of 
each instance in the test data. The effectiveness of the model. 



using the same connectivity criterion outlined in section 5.2 



above. Conversely, an edge is said to have decayed if it was 
observed to exist in ri = (f, f H- Af) but it can longer be detected 



^We believe that a 4 week period is a long enough time window for deter- 
mining edge persistence/decay at least in the short to medium term. While tech- 
nically an edge could be inactive during this period and reappear afterwards, all 
indications are that very few edges are like this, and those that are are very 
weak and fleeting. While we could have lengthened the T2 time period, this 
would have meant shortening the n period, but doing so would have affected 
our estimates of edge features that we use in the analysis. Given that we have 
a total time window of ri + T2 = S'*'"''*', we decided that the best strategy is to 
divide the period in half and define decay as the non-occurrence of voice calls 
between / and j in the second time period. 
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Figure 1 : Diagram of the data-splitting procedure used to generate basic features and to determine dyadic class-membership in the analysis below. 



then, depends on its accuracy (or some other measure of per- 
formance) on the test data. As shown in Figure [T] the data are 
split within each time period (ti, T2) into training and test sub- 
sets. In the analysis reported below, we randomly designated 
2/3 of the original examples in the data in the first period to the 
training set and used the remaining 1/3 of the data as the testing 
set. The figure also shows the number of dyads in testing and 
training set that ended up in the decayed class (about 43%). 

6.1. The decision-tree classifier 

There are a number of potential models available for evalu- 
ating the effectiveness of our chosen features at predicting edge 
decay. Perhaps the simplest of these, which was used in Burt's 



analysis (Burt 2000jl, is simple regression: plotting each fea- 
ture as the independent variable against the probability of de- 
cay. Such an approach has the distinct advantage that it is easy 
to interpret. Regression is of course one of many classification 
tools available and also has the advantage of relative ease of in- 
terpretation. In what follows we present results obtained from 
both a logistic regression classifier (which provides easily inter- 
pretable output that can be compared with previous research on 
the subject) and a decision-tree classifier which is an approach 



that has not been used very often in the analysis of Social Net- 
works]^ While relatively unfamiliar in the analysis of social 
networks, the decision tree is the most well-known and well- 
researched method in data-mining and provides output that is 
easily translatable into a set of disjoint "rules" for (probabilis- 
tically) assigning different cases to one of the two outcomes. 
In our case we are interested in what combination of features 
maximize either edge decay or edge persistence. Because read- 
ers may not be wholly familiar with the decision-tree classifier 
approach, we provide a brief introduction to the basics of the 
approach before presenting the results. We presume that read- 
ers are familiar with the basics of logistic regression so we will 
not discuss it in detail. 

A decision tree classifies examples with a hierarchical set of 
rules. A decision tree model is built by recursively dividing the 
feature space into purer (more discriminating) subspaces along 



^^An important consideration with machine learning methods, as with sta- 
tistical methods, is the choice of model to which we attempt to fit the data. A 
nearly boundless series of models has been developed in the hterature jWitten| 
[and Frarik| [2005} , and a discussion of the merits of each is well beyond the 
scope of this paper Instead, we will discuss one of the models we chose (a 
decision tree model called C4.5 jQuinIan| |1993| ) and its relative strengths for 
the problem at hand. 
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Figure 2: (a) Toy classification dataset. (b) Resulting decision tree 



splits that are parallel to the feature axes. A very simple exam- 
ple is shown in Figure|2] Given the task of classifying unknown 



points as either blue circles or red squares (Figure 2(a) i, a deci- 
sion tree trained on the data points shown will produce a series 
of splits along the two dimensions in the figure (x and y). Gen- 
erally, the first split is along whichever attribute is deemed to 
be the best separator of the classes according to some measure. 
Our implementation of C4.5 determines the "best" split using 
information gain (which we will formally define in Section |9] 
below). Our hypothetical decision tree makes its first split along 
the line down the middle of the figure (x = 5) 

A tree induction algorithm will then recursively divide up 
each of the resulting subspaces until some stopping criterion 
(i.e. a minimum number of instances per leaf or minimum 



leaf purity) is met. Figure 2(b) shows the decision tree gen- 
erated by the splits corresponding to the "data" in the left-hand 
side. Unknown instances are classified by taking the appro- 
priate branches of the tree until a leaf is reached. The class 
assigned to the unknown instance is whichever class was most 
common among the training instances at that leaf. The Figure 
shows that any instance with x > 5 and y > 10 is classified 
as a red square, while everything else is a blue circle. The pri- 
mary advantage of decision trees for our decay prediction task 
(besides, of course, reasonable performance) is interpretability . 
Examining the classification accuracies at the individual leaves. 



we can see where the model is strong and where it is weak. Ad- 
ditionally, decision trees enable us to show that the importance 
of the features defined in Table |2l 

6.2. Outline of the empirical analysis 

In what follows, we consider the following three empirical 
issues within our chose time window (r): 

1 . Feature correlation: In the initial time window ti - {t,t + 
Af), what is the correlation structure of the features shown 
in Tabled? 

2. Feature predictiveness: Having observed the network in a 
time window t\ - (f, f-i- Af), which features of the network 
are most predictive of the class membership of each edge 
(persistent/decayed) in the adjacent time window T2-(t + 
At,t + At + Af')? 

3. Edge-class Prediction: Given a set of feature-predictors 
from the initial time window ti = (f, f -t- Af), can we build 
a model that accurately predicts the class membership of 
the edges observed in the following time window T2 = 

(f -H Af , f H- Af -H Af')? 

After briefly considering some basic descriptive statistics on 
each of the predictor features in the next section, in section [8] 
we shed light on the first question by examining the pairwise 
Spearman correlation coefficients (p) among all pairs of fea- 
tures in Table |2j in section |9] we address the second question 





(a) Outdegree of i 



(b) Outdegree of j 



(c) N. of calls made by / 
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(rn) N. of calls from j — > i's neighbors (n) Time of first call from i to j (o) Time of last call from i to j 

Figure 3: Cumulative distributions of the features included in the analysis. 
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Feature 


Description 


Range 


Median 


vertex Level 






Out degree of / (Ego-network Range) 


1-49 


2 




Out degree of j (Ego-network Range) 


0-49 


1 




Number of calls made by / (gregariousness) 


1-1366 


22 




Number of calls made by j (gregariousness) 


0-1366 


22 


Dycid Level 






Calls from / to j (directed edge strength) 


1-1341 


2 




Calls from j to ; (reciprocated edge strength) 


0-1341 


1 


PiJ 


Proportion of is calls that go to j (cij/cj) 


0-1 


0.15 


Pji 


Proportion of j s calls that go to / (cji/cj) 


0-1 


0.08 


iridU L/CVcl 




cn 


Number of common neighbors between / and j (edge embededness) 


0-46 





in 


Number of is neighbors that call j (directed edge embededness) 


0-39 


1 




Number of j s neighbors that call / (directed edge embededness) 


0-39 





lYljYl 


Number of calls that /'s neighbors make toy 's neighbors (2nd order embededness) 


0-274 


7 


jnin 


Number of calls that j 's neighbors make to /'s neighbors (2nd order embededness) 


0-274 


6 


temporal 




fdate 


Normalized time of first call from / to j (edge newness) 


0-1 


0.26 


edate 


Time of last call from ; to j (edge freshness) 


0-1 


0.74 



Table 2: List of features to be used in predicting edge persistence/decay — > / 

using an information-theoretic measure of randomness for pre- 



dicting short-term decay. Finally Section 10 addresses the fi- 
nal question by formulating the edge-decay prediction task as 
a binary classification problem using a machine learning data 
analysis strategy. 

7. Feature statistics 

The range and median values on the features are computed 
based on data from the first 4 week time period and summary 
statistics are provided in Table|2] As shown in Figure 6. 1 of the 
distributions are highly skewed with substantially more lower 
than higher values so we report medians. As noted earlier we 
omit edges with vertices which have degrees greater than or 
equal to 50 in order to eliminate robot calling, so vertex degree 
ranges from 1-49. Note that we have included asymetric edges. 



that is edges in which i called j during ti, but j did not return 
a call during that time period, so there are nil values for dj, cj, 
and Cji. 

During this ti period the median outdegree for the focal ver- 
tex / is 2, but for its paired vertex j 1 because of the presence 
of asymmetric edges in which d""' > and d""' = 0. The me- 
dian value for the number of calls made by subscribers is 22. 
At the dyad level, the median number of calls from / to j is 2, 
but from j to / it is 1, again because of asymmetric edges, cji, 
therefore should be viewed as an indicator or reciprocity in that 
it indicates the extent to which vertex j makes calls to / given 
that / made at least one call to j. pij and pji are normalized 
versions of c,j and cji respectively and indicate what proportion 
of the total calls made by the subscriber went to each of its 
/■ neighbors. The median value of pij of 0.15 indicates that for 
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di 


d] 




a 












op 




PV 


PJI 


cn 




in 




jn 




injn 


jnm 


fdate 


edate 


di 


1.00 





.29 





.76 





.19 





.00 


-0 


.09 


-0.68 


-0.18 





.27 





.29 





.18 


0.45 


0.48 


0.03 


0.00 


dj 


0.29 


1 


.00 





.22 





.80 





.02 





.23 


-0.18 


-0.01 





.38 





.24 





.37 


0.89 


0.75 


0.00 


0.02 


ci 


0.76 


0, 


.22 


1 


.00 


0, 


.21 


0, 


.26 





.08 


-0.61 


-0.02 


0, 


.27 





.28 


0, 


.21 


0.38 


0.41 


-0.13 


0.15 


cj 


0.19 





.80 





.21 


1 


.00 





.13 





.38 


-0.07 


0.06 





.36 





.23 





.35 


0.73 


0.66 


-0.08 


0.09 


CIJ 


0.00 





.02 





.26 





.13 


1 


.00 





.55 


0.53 


0.49 





.22 





.23 





.20 


0.13 


0.21 


-0.51 


0.52 




-0.09 


0, 


.23 





.08 


0, 


.38 


0, 


.55 




.00 


0.38 


0.89 


0, 


.27 





.18 


0, 


.31 


0.37 


0.47 


-0.33 


0.33 


Pij 




-0 


.18 




-0 


.07 





.53 





.38 


1.00 


0.41 


-0 


.09 


-0 


.09 


-0 


.05 


-0.24 


-0.21 


-0.30 


0.29 


Pji 


-0.18 


-0 


.01 


-0 


.02 





.06 





.49 





.89 


0.41 


1.00 





.14 





.07 





.18 


0.16 


0.31 


-0.29 


0.29 


cn 


0.27 





.38 





.27 





.36 





.22 





.27 


-0.09 


0.14 


1 


.00 





.77 





.80 


0.56 


0.57 


-0.12 


0.14 


in 


0.29 


0, 


.24 





.28 


0, 


.23 


0, 


.23 





.18 


-0.09 


0.07 


0, 


.77 


1 


.00 


0, 


.64 


0.43 


0.46 


-0.12 


0.14 


jn 


0.18 


0, 


.37 





.21 


0, 


.35 


0, 


.20 





.31 


-0.05 


0.18 


0, 


.80 





.64 


1, 


.00 


0.52 


0.53 


-0.11 


O.IS 


injn 


0.45 


0, 


.89 





.38 


0, 


.73 


0, 


.13 





.37 


-0.24 


0.16 


0, 


.56 





.43 


0, 


.52 


1.00 


0.93 


-0.06 


0.08 


jnin 


0.48 


0, 


.75 





.41 


0, 


.66 


0, 


.21 





.47 


-0.21 


0.31 


0, 


.57 





.46 


0, 


.53 


0.93 


1.00 


-0.10 


0.13 


fdate 


0.03 


0, 


.00 


-0 


.13 


-0, 


.08 


-0, 


.61 


-0 


.33 


-0.30 


-0.29 


-0, 


.12 


-0 


.12 


-0, 


.11 


-0.06 


-0.10 


1.00 


0.16 


edate 


0.00 


0, 


.02 





.15 


0, 


.09 


0, 


.52 





.33 


0.29 


0.29 


0, 


.14 





.14 


0, 


.13 


0.08 


0.13 


0.16 


1 1.00 



Figure 4: Spearman Correlation (p) between pairs of features in one week of call data. 



the edge at the middle of the pij distribution about 15% of its 
total calls went to its neighbor. 

Turning to the neighborhood-level features, the vertices 
joined by the median edge do not share a common neighbor as 
indicated by the median value of for cn. There appears to be a 
good deal more embeddedness when we measure it in terms of 
the edges between / and /s neighbors instead of directed edges 
from f s (or /s) neighbors to j (or /); the neighbors of vertices 
joined by the median edge are expected to make about seven 
calls to one another Finally for the two temporal features we 
have normahzed them so their values range from 0- 1 . Values on 
fdate and edate indicate where, in the 4 week time period, the 
relevant events occurred. For fdate, the newness of the edge, 
the median value is .26 indicating that 50% of the edges were 
active (a call had been made from / to f) at or before about one 
week had elapsed. For edate, the freshness of an edge, the me- 
dian value of .74 indicates that for 50% of the edges the last call 
was made by / to j before three weeks had transpired. 

8. Feature correlations 

Figure [4] shows the Spearman correlations that capture the 
association between features during the first four week time pe- 
riod. A number of very strong correlations (shaded red — for 



positive correlations — and blue — for negative correlations — in 
the figure) are immediately apparent. Among the vertex level 
features, degree {di or dj) and gregariousness (c,- or cj) are 
highly correlated (.76 and .80) indicating that vertices with 
more neighbors make more calls. Among the dyadic level fea- 
tures the normalized and raw features of directed edge weight 
are correlated given that the normalized measure {pij) is a func- 
tion of the raw measure (c,j). The correlation between c,y and 
Cji is positive indicating the presence of reciprocity, that is as 
the number of calls from / to j increases so does the number 
of calls from j to /. However this correlation is not extremely 
high, indicating that dyads do vary in their level of reciprocity. 
Among the neighborhood-level features, the correlations are 
positive and for the most part large, which could indicate that 
the simplest measure of embeddedness, the number of common 
neighbors (cn), is a good enough measure and one does not need 
to look at directional or second-order embeddedness. Finally it 
is interesting to note that the two temporal features, fdate and 
edate, are independent, edges that are relatively older (i.e. a 
call occurred earlier in the 4 week time period) can be more or 
less current. 

Looking at the correlations across the various categories of 
the features, what is striking is how low they are. For the most 
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Feature 


Description 


Info. Gain 


di 


Degree of /. 






00235 


dj 


i-'egiee oi j. 






0.00234 


a 


v^alla lllaLlC uy I. 






0.01181 


C i 

J 


Ctills nitide by /. 






0.00449 


c- ■ 


Calls from / to j. 






0.17948 




Calls from j to /. 






0.12823 




Proportion of /'s calls that go to / 






13318 


l-'j' 


Proportion of fs calls that go to /. 






0.12043 




Number of /'s neighbors that call j 






02478 


ill 


Number of /s neighbors that call / 


s neij 


illUOla . 


0.02303 


cn 


Number of common neighbors between 


( and j. 


0.02441 


jnin 


Number of /s neighbors that call / 


s neij 


|hbors. 


0.01501 


in jn 


Number of /'s neighbors that call j 


s neij 


'hbors. 


0.00493 


fdate 


Time of first call from / to J. 






0.05104 


edate 


Time of last call from ; to /'. 






0.09954 



Table 3: Information gain of eacli feature for predicting sliort-term edge decay using four weeks of data. The information gain measures tlie conditional ability of 
that feature to predict edged decay in the subsequent week within levels of the other features. 



part vertex, dyadic, neighborhood and temporal features are in- 
dependent from each other One exception to this pattern is 
the correlation between outdegree (t/, or dj) and the normal- 
ized edge weight features, pij or pji. This has to be the case 
because the sum of the pz/s for a give / is 1, so the more neigh- 
bors a person has, the lower the proportion of their calls going 
to each neighbor must be (the so-called bandwidth/range trade- 
off (Aral and Van Alstyne 2007| l). Another exception to the 
pattern of low correlations between features in different cate- 
gories, is the high correlations between outdegree (and gregari- 
ousness) and the second order embeddedness features, in jn and 
jnin. Agents that have more neighbors are also going to have 
more edges between their neighbors and the neighbors of the 
other vertices to which they are connected. As such our second 
order embeddedness features essentially reduce to indicators of 
vertex range. Finally, the two "temporal" features, fdate and 
edate are correlated with edge weight (c,j and to a less extent 
Cji). Recall that fdate features the newness of an edge and that 



lower values are indicative of older edges. The negative corre- 
lation of fdate with c,j indicates, therefore, that newer edges 
are weaker and older edges are stronger. The positive correla- 
tion of edate with c,j indicates that fresher edges (i.e. edges in 
which a call has been made more recently) are also stronger 



In sum, it appears that there are really four relatively indepen- 
dent sets of edge features pertaining to vertex, dyadic, neighbor- 
hood and temporal levels. While there are multiple indicators 
within each of these sets, they tend to be highly correlated, with 
the exception of the two "temporal" features. Though in the re- 
mainder of the paper we will be looking at the predictive value 
of all these features, based on these correlations our focus will 
be on the following potentially important features: outdegree 
(both di and dj), edge weight (c,j), reciprocated edge weight 
{cji), the number of common neighbors (cn), and both the new- 
ness of the edge if date) and its freshness {edate). 
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9. Feature Predictiveness 

We wish to determine the extent to which each of the above 
features as observed in the first time window helps us classi- 
fies edges as either decayed or persistent in the following time- 
window. By determining this, we can quantify, to some extent, 
the usefulness of the features for decay prediction. There are 
several possible indicators of predictive ability. Here we rely 
on the information gain, which is the standard measure of fea- 
ture predictiveness in data-mining ( [Witten and Frank 2005 i. 
Approaches to determine the importance of predictors based 



on information theory are common in statistics ( Menard 2004 



Gilula and Haberman 2001| l. Information-theoretic approaches 
have been applied before in the characterization of overall struc- 
tural features of social networks (e.g. ( |Butts| |2001t |Leydes-| 
dorff 1991| l; here they are deployed in the interest of quantify- 
ing the predictive ability of fine-grained (local) structural fea- 
tures for the link prediction problem. 

9.1. Formal definition of Information Gain 

Information gain tracks the decrease in entropy associated 
with conditioning on an attribute, where entropy is a measure 
of the randomness (alternately, predictability) of a quantity. To 
understand the measure and how it quantifies feature impor- 
tance, consider as an example our two-class cell phone dataset, 
where each edge is either persistent (class 1) or decayed (class 
0). If we define p{x) as the proportion of instances of class 1 
and q{x) as the proportion of class zero, the entropy H{x) is 
defined as: 



H{x) = -p{x) log p(x) - q{x) log q(x) 



(1) 



where all logarithms are taken to base 2. If the two classes are 
perfectly balanced, then the entropy H{x) - log 2-1. As 
the classes become increasingly imbalanced, the entropy de- 
creases. That is to say, we know more a priori about the class 
of a random instance. If a particular feature is informative, then 
conditioning on that feature should decrease the entropy of the 
dataset. Suppose, for example, that a feature F takes a set K of 



possible values. The conditional entropy of the dataset condi- 
tioned on the feature F is: 

Hix\F) = ^ -Pk(x) log pk(x) - quix) log q^ix) (2) 

k&K 

Where pk{x) - p{class - \\F - k) is the proportion of 
positive-class instances among the instances where the fea- 
ture F takes the value k. Similarly, qk{x) is the proportion of 
negative-class instances. The information gain for the feature 
F is the decrease in entropy achieved by conditioning on F: 
I(x\F)^H(x)-^. 

Returning to our hypothetical example, suppose there is a 
feature F that takes on two values: A and B. Instances with 
F = A are 90% class 1 and instances with F = B are 90% 
class 0, then 1(F) ^ log 2 - (-pog ^ - ^og = 0.530. 
The information gain I{x\F) has an appeaUng intuitive interpre- 
tation as the percentage of information about the class that is 
revealed by the feature F. By calculating the information gain 
of each feature in Table |2) we can determine which actor and 
edge attributes reveal the most information about edge decay. 

9.2. Feature Importance in the Call Network 

Table[3]shows the information gain of each feature described 
in Table |2] calculated for the first four weeks of data. The re- 
sults show that the four most predictive features are dyadic 
features of directed tie strength as given by the frequency of 
interaction and the extent to which communications are con- 
centrated on a given alter: number of calls sent and received 
along the edge (c^ and Cji) and call proportions from both ; 
to j and j to /. There is a substantial drop-off' in the informa- 
tion gain produced by the remaining features. After the dyadic- 
level features, the most important predictors are associated with 
the observed age of the tie and the recency of communication 
if date and edate, respectively). Here we observe that time of 
first call between ; and j (edge newness) is only about half as 
predictive as the time of last call between i and j (edge fresh- 
ness) ((I{decay\fdate) - 0.05 versus (I(decay\edate) = 0.10), 
suggesting that freshness beats newness as a predictive crite- 
rion. These are followed, in terms of predictive ability, by the 
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neighborhood-level (e.g. number of common neighbors and 
frequency of interaction among neighbors of the two members 
of the dyad) and the vertex-level features. The predictiveness 
afforded by either vertex or neighborhood level features is com- 
paratively minimal. 

These results suggest that previous research on tie decay, 
which has for the most part been unable to consider the strength 
of individual ties (as it has limited itself to binary network data), 
may have missed the most critical single factor for tie decay. 
This raises the question: Do features that have previously been 
deemed important (such as embeddedness, newness, and range 
actually drive tie decay or are they merely correlated surrogates 
for tie strength and therefore appear important only when con- 
crete measures of strength are absent? 

10. Predicting edge persistence and decay 

10.1. Classifier comparison 

Table [4] summarizes the performance of our two classifiers 
under all four prediction scenarios. It presents four stan- 
dard performance metrics: accuracy, precision, recall, and F- 
Measure. Accuracy is the proportion of all instances that the 
model correctly classifies. The other three metrics measure the 
types of error made by the classifier. Recall gives the proportion 
of observed persistent ties that the model correctly classifies as 
persisting while precision gives the proportion of ties that the 
model predicted as belonging to the persistent class that actu- 
ally did persist. Precision and recall, to some extent, measure 
two competing principles. Theoretically, a model could achieve 
very high recall by classifying all ties as persistent, but such a 
model would have very low precision. Similarly, a model could 
achieve perfect precision by classifying only its most confident 
instance as positive, but in doing so, it would achieve very low 
recall. The F-Measure captures the trade-off" between precision 
and recall. This is defined as the harmonic mean of precision 
(P) and recall (R): 



where P is the precision and R is the recall of the model in 
question. 

We evaluate both classifiers on both the majority class, per- 
sistence (57% of dyads), and the minority class, decay (43%). 
A classifier is expected to do better on the majority class be- 
cause there is more available data with which to build the pre- 
diction. As shown in the first two columns of Table |4] the 
decision-tree classifier performs reasonably well in regards to 
the majority class: it correctly predicts 73.7% of all ties (accu- 
racy) and 75.4% of all persisting ties (recall). In regards to the 
minority class, decay, the decision-tree classifier does a little bit 
worse. The decision-tree classifier correctly predicts 71.4% of 
all decaying ties. The model is also less precise when it comes 
to predicting decay. About 68.4% of ties predicted to decay do 
in fact decay, while in the case of persistence about 78% of the 
ties that the model predicts persist do in fact persist. Overall 
the decision tree classifier does a good job and shows tie persis- 
tence in social networks is fairly predictable in the short-term 
from local structural, temporal and vertex-level information. 

The last two columns of Table |4]present these same fit statis- 
tics when we use the logistic regression classifier for the de- 
cay/persistence prediction task. The results are very similar to 
the results obtained when using the C4.5 decision tree model. 
The logistic regression correctly predicts 73.4% of all ties (ac- 
curacy), and for the majority class about 72% of persisting ties 
are correctly classified by the regression model (recall). In con- 
trast to the decision tree model, the recall values are higher for 
the decay class. The logistic regression model correctly classi- 
fies about 75% of decayed ties, while the decision tree model 
correctly classifies about 71.4% of decayed ties. This is not a 
big diff'erence, but it does seem to indicate that in this case the 
logistic regression model does a slightly better job predicting 
decay, while the decision tree model does a slightly better job 
predicting persistence. However, the precision results on the 
decay class are slightly worse in the logistic regression model 
compared to the decision tree model. The result is that the F- 
statistic is about the same across the two models. 





Tree 


Logistic 






P(^1*C1 etc r^f^r*Q\/ 


Accuracy 
Precision 
Recall 

F 


0.737 0.737 
0.780 0.684 
0.754 0.714 
0.767 0.699 


0.734 0.734 
0.796 0.668 
0.722 0.751 
0.757 0.707 



Table 4: Comparison of model fit-statistics for the decision-tree and logistic regression classifiers. 



In sum, both the decision tree and logistic regression clas- 
sifiers indicate that tie persistence and decay patterns are pre- 
dictable and that using either model yields fairly similar lev- 
els of prediction and error The consistency between these two 
ways of modeling the data — a more standard regression ap- 
proach and a relatively non-standard data mining approach — 
gives us confidence in the results. After presenting the results 
of the logistic regression coefficients in the next section, we 
turn to the decision tree results and show how they yield new 
insights about what is predicting tie persistence/decay in social 
networks. 

70.2. Logistic regression classifier results 

Table |5] shows the parameter estimates from the logistic re- 
gression model (predicting the log-odds of a tie persisting) 
along with the odds-ratios. The estimates are based on a full 
model including all the features. We do not report standard 
errors as all the estimates are statistically significant given the 
large size of the training data on which these parameters are 
estimated. 

Beginning with the features that our information gain values 
indicated were likely to be the most important (see table |3] and 
the discussion in section[9]l we see that the call volume from /to 
i (directed tie strength, c,j) has a positive effect on persistence. 
For each additional call made, the odds of the tie persisting is 
almost 4% higher. Net of this influence, the number of calls 
that j makes back to /, c^,, is also positive. For each additional 
reciprocating call, the odds of a tie persisting increases about 
2%. 

The effects of the outdegree of each member of the dyad have 



opposite signs. In general, an edge that starts from a vertex with 
a large number of neighbors has a higher chance of decaying. 
However, if that edge is directed at a vertex of high-degree, then 
it has higher chances of persisting. These effects have a straight- 
forward interpretation, high-degree actors have less persistent 
edges, but this effect is mitigated when these edges are directed 
towards other high-degree actors The other two vertex-level 
features pertaining to gregariousness (c, and cj) have very small 
effects. This indicates that after adjusting for degree, raw com- 
municative activity does not appear to be involved in processes 
of edge persistence and decay. 

Turrring to the neighborhood-level measures, all the effects 
are positive except for the 2nd order embeddedness measure 
in jn, which as noted earlier is correlated with c/, and dj. In gen- 
eral embeddedness increases the odds that a tie will persist, con- 
sistent with previous research that show that embedded edges 



decay at a slower rate (Burt 1997 2000 1. For each additional 
common neighbor between i and j, the odds of a tie persisting 
increases 5.4%. The directed embeddedness measures in and 
jn appear to be even stronger For example, for each additional 
neighbor of i that calls j the odds of the tie persisting increases 
15%. 

Finally the temporal measures have opposite effects, fdate 
has a negative effect on persistence indicating the newer ties 

''This suggests that bulk of the fluctuating, low-persistence edges character- 
istic of high-degree actors are those which are directed towards actors of low 
degree. When popular actors connect to other popular actors, their relationships 
tend to be more stable than when they connect to low-degree alters. Conversely, 
while low-degree actors tend to have — on average — more stable relationships, 
these become even more stable when directed at more popular alters. 
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Feature Description 


P Odds (exp(/3)) 


di Degree of / 
dj Degree of j 
Ci Calls made by i 
Cj Calls made by /' 


-0.0335 0.9671 
0.0057 1.0057 
0.0003 1.0003 
-0.0013 0.9987 


Cjj Calls from / to j 

Cji Calls from j to ; 

Pij Proportion of /'s calls that go to j 

Pji Proportion of j's calls that go to / 


0.0373 1.0380 
0.0229 1.0232 
0.0504 1.05178 
0.8521 2.3446 


7M N^iiinhpr of /'q npitrhhorQ th?it p?ill / 

jn Number of /s neighbors that call /'s neighbors. 
cn Number of common neighbors between / and j. 
jnin Number of /s neighbors that call /'s neighbors. 
in jn Number of /'s neighbors that call /s neighbors. 


1 409 1 1513 
0.0877 1.0917 
0.0525 1.05391 
-0.0366 0.9641 
0.0416 1.0425 


fdate Time of first call from / to j. 
edate Time of last call from / to j. 


-2.3021 0.1000 
2.9218 18.5747 



Table 5: Logistic regression coefficients of the effect of each feature in predicting edge-persistence. 



(which have higher values on fdate) are more likely to decay, 
indicative of the liability of newness that Burt ( 1997| l notes is an 
important characteristics of social ties. On the other hand edate, 
the freshness of the tie, has a positive effect on persistence. Ties 
that have been activated recently are more likely to persist than 
those that have been inactive. 

10.3. Decision-tree classifier results 

As we mentioned in Section |6.1| the structure of decision 
trees can offer insights into the underlying characteristics of the 
data on which they were trained. Recall that, at each subtree, 
our C4.5 implementation chooses the attribute with the largest 
information gain on the data within that subtree. This means 
that, at each step, the attribute providing the greatest amount 
of additional information is chosen for further splitting. Fig- 
ure |5] shows selected branches of the resulting decision-tree 
obtained from the training data. In the figm^e, directed edge 
weight (cij) — as measured by the number of calls directed from 
one person to another — is the strongest discriminator of class 
membership as we saw earlier (Table |3]l and thus stands as the 



top node of the tree. As deeper levels of the tree we find that 
conditional on directed edge weight other dyadic and one tem- 
poral feature helps to predict tie decay, but not vertex-level fac- 
tors such as degree and neighborhood level factors such as the 
number of common neighbors. 

The left-hand side of the figure shows that the optimal di- 
rected edge-weight (c,y) cutoff differentiating persistent from 
decayed dyads in our data is approximately 3. Dyads in which 
one of the actors contacted the other more than three times in 
the initial 4-week period have very strong odds of being clas- 
sified as active in the following 4-week period {p - 0.86). If 
in addition to that (as we follow the tree into the third level), 
the edge has been activated recently (has high freshness) then 
we can be virtually certain that they tie will persist {p - .91). 
If the edge has not been refreshed recently, however, then the 
probability of persistence drops substantially ip - .67) 

The right hand side of the figure shows that for edges with 
relatively weak directed weight, the odds of decay are relatively 
high ip - 0.59). If in addition, the edge is non-reciprocal (with 
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Total N: 10,932,872 
Decayed: 4,672,437 
Prob (Persist)=0.57 



Directed tie strength? (Cij) 





Total N: 4,078,056 
Decayed: 639,263 
Prob (Persist)=0.86 



Total N: 6,854,816 
Persistent: 2,821,642 
Prob (Decay)=0.59 
" A 



Edge Freshness? (edate) 



Reciprocated tie strength? (Cji) 



Total N : 1,098,460 (Total N: 2,979,596 



Decayed: 363,086 
Prob (Persist)=0.67 



Decayed: 276,177 
Prob (Persist) - 0.91 




Total N: 2,235,419 
Decayed: 958, 956 
Prob (Persist) - 0.57 

^ 1 



Total N: 4,519,397 
Persistent: 1,545,179 
Prob (Decay) = 0.67 



Directed tie strength? (Cij) 



^^^^ ^^^^ 



Total N : 1,275,476 
Decayed : 460,745 
Prob (Persist) = 0.64 



Decayed: 3,079,967 
Persistent: 845,557 
Prob (Decay) = 0.73 



Edge Freshness? (edate) 




Total N : 582,266 
Decayed: 168,881 
Prob (Persist) - 0.71 



Figure 5: Selected leaves of the best-fitting decision-tree obtained from the training set. 



incoming directed strength being even weaker or equal to zero) 
then the probability of decay rises concomitantly {p - 0.67). 
However, even with low levels of directed strength (c,y < 3), an 
edge characterized by reciprocity has a relatively decent chance 
of persisting in the next period (p = 0.57), if in addition to this 
the edge is on the "high-side" of the corresponding weight cut- 
off (2 > Cij < 3), and it was also active later in the time period 
(has high-freshness), then the probability then the probability 
of being classified in the persistent class improve substantially 
(p = 0.71). 



11. Discussion and Conclusion 

In this paper we explore the question of short-term decay of 
cell-phone contacts as a problem of decay/persistence predic- 
tion: determining what local structural features allow us to best 
determine whether certain dyads that are considered to be con- 
nected during a given time window will be disconnected dur- 
ing an immediately adjacent time window. Using large-scale 
data on millions of dyads from a large non-U. S. cell phone 
provider, we investigate to what extent we can gain empirical 
leverage on the decay prediction problem. Our analytic frame- 
work is guided by prior literature on the structural and vertex- 
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level predictors of edge-decay in informal social networks. Us- 
ing observational data from call logs, we calculate features of 
ego-network range, communicative range, edge-strength, reci- 
procity, embeddedness, edge-newness and edge-freshness. 

In all we took into account a total of 15 vertex-level, 
dyadic, neighborhood-level and temporal features (e.g., edge 
weight, embeddedness, ego-network range, and newness) most 
of which incorporated information on the relative frequency of 
interaction, and thus on the weight associated with each com- 
ponent arc in the cell-phone network ( Barrat et aT] |2004[ ). The 
results support our emphasis on the importance of edge weights, 
as we find that, according to the information gain metric (an 
information-theoretic measure of predictiveness) factors related 
to directed edge weight — essentially the measure of total di- 
rected communicative flow within the dyad — are more predic- 
tive of decay than any of the other types of factors. Our analysis 
of the correlation structure of the other types of features (vertex, 
dyad, neighborhood-level and temporal) with empirical indica- 
tors of edge weight suggested that while there is a reasonable 
amount of correlation between edge weight and these other fea- 
tures, it is not strong enough to conclude that edge weight is a 
redundant by-product of other local-structural factors. To ex- 
plore the conjoined effect of the various features on edge-decay 
we built a decision-tree and logistic regression classifier and 
evaluated their joint effectiveness at predicting short term de- 
cay in the cell-phone contact network. We found that that both 
classifiers performs reasonably well. 

The logistic regression classifier results are consistent with 
what we know about the structural and temporal dynamics of 
relationship persistence and decay. Stronger ties are more likely 
to persist and reciprocation increases persistence as well. While 
the overall calling activity of each of the actors involved in the 
dyad is not that important, the number of neighbors that they are 
connected to is, with decay increasing for outgoing ties origi- 
nating from high-degree actors, but with this effect being con- 
tingent on the number of neighbors of the target actor This 
result implies that relative inequahties in network range can tell 
us something about the expected stability of edges in social net- 



works, as the bulk of the "instability" in edge evolution may be 
accounted for by the activity of high-degree actors. This re- 
sult is consistent to that obtained in a network constructed us- 
ing email trace logs ( [Aral and Van Alstynej |2007t |Kossinets| 



and Watts 2006 1. Embeddedness is also important. Wlien a 



tie is embeddded in triadic or larger structures, they are pro- 
tected from fast decay. Finally, new ties are more likely to de- 
cay, while ties that have been active recently are more likely to 
persist. 

Finally, we show that the structure of the decision-tree clas- 
sifier can provide useful insights on the relative importance of 
different vertex-level and dyadic level processes in determining 
the probability that particular types of edges in the cellphone 
network (e.g. high versus low weight) will decay. The results 
of the decision-tree classifier are consistent with the initial fea- 
ture predictiveness results, giving us what combinations of the 
high-information gain features shown in Table |3] generate per- 
sistence and decay. As the decision tree shows, the most im- 
portant predictors are directed edge strength, reciprocated edge 
strength and the freshness of the tie. So while network range, 
embeddedness, and tie age can be used to predict persistence as 
the logistic regression estimates and information gain statistics 
indicate, they are not the most important factors. 

Phrased in terms of "rules," we can say that persistent edges 
in the cell-phone network are those characterized by high-levels 
of interaction frequency coupled with relatively constant re- 
activations (freshness) of the edge over time. Edges at high risk 
of decay on the other hand, are characterized by relatively low 
levels of interaction and nonreciprocity. Finally, a second path 
towards persistence appears to be characteristic of "nascent" 
edges which have yet not had the opportunity to gain strength: 
here relatively weak flows are combined with reciprocity and 
recent activation to produce persistence in calling behavior, at 
least in the short term. 

In terms of contemporary models of relationship evolution, 
this last result suggests that in order to persist, social relation- 
ships must first cross a boundary where the the directed attach- 
ment between ego and alter becomes "synchronized." This im- 
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plies that the observed strength of older relationships may be an 
outcome of the achievement of reciprocity at the early stages; 



thus as Friedkin ( 1990 241) notes ". . .reciprocation and bal- 
ance are crucial for both the occurrence and durability of a 
strong relationship." In this respect, while strong weight — and 
thus frequency of interaction ( Homans \950) — is sufficient to 
guarantee a persistent (if in some cases asymmetric), relation- 
ship after a certain relationship-age threshold is crossed, reci- 
procity appears to be more important for the longer-term sur- 
vival of weaker edges, especially in the nascent stages of the 
relationship ( |Friedkin||1990| l. 

These time-dependent balance/strength dynamics therefore 
seems to us to deserve detailed consideration in future model- 
ing efforts. In this paper we have attempted establish the begin- 
nings of a framework with which to rank factors that differen- 
tiate those links fated for quick dissolution from those that will 
become a more permanent component of the social structure. 
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